China TCM News Article Scraper avatar

China TCM News Article Scraper

Pricing

Pay per event

Go to Apify Store
China TCM News Article Scraper

China TCM News Article Scraper

Scrapes articles from 中国中医药网 (cntcm.com.cn) — the official journal of China's National Administration of Traditional Chinese Medicine. Extracts title, author, publish date, source edition, full article body, and metadata tags (TCM topics, related herbs, integrative keywords) for each article.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Scrape full-text articles from 中国中医药网 (cntcm.com.cn) — the official journal of China's National Administration of Traditional Chinese Medicine. Extracts article title, author, publish date, source edition, and full article body (both plain text and HTML), plus keyword tags for TCM topics, integrative medicine crossovers, and related Chinese herbs.

What you get

Each scraped article includes:

  • article_id — unique identifier from the URL
  • title — article headline in Chinese
  • category — mapped to: policy_regulation, clinical_research, herb_pharmacology, traditional_practice, news_industry, yangsheng_wellness
  • publish_date — as printed on the page (YYYY-MM-DD)
  • source — newspaper edition or column (e.g., 中国中医药报7版)
  • author — byline
  • body_text — full article body as plain text
  • body_html — full article body as raw HTML
  • tcm_topics — detected TCM keywords (针灸, 中药, 养生, etc.)
  • integrative_keywords — crossover wellness terms (yoga, qigong, 瑜伽, etc.)
  • related_herbs — Chinese herb names cited in the article
  • source_url — canonical article URL
  • scraped_at — ISO timestamp

Use cases

  • Clinical research: Track TCM policy announcements, trial reports, and regulatory updates from China's primary source
  • Pharma regulatory monitoring: Monitor CN herbal medicine policy and approval news
  • Academic research: Sinology, comparative medicine, integrative health studies
  • LLM training corpora: High-quality Chinese medical text from an authoritative institutional source
  • Integrative medicine: TCM-yoga/qigong/meditation crossover content discovery

Inputs

FieldTypeDescriptionDefault
maxItemsIntegerMaximum articles to scrape (0 = no limit)10
startDateStringOnly articles published on or after this date (YYYY-MM-DD)

Notes

  • Discovery uses the site's comprehensive /sitemap.txt (~46,000+ article URLs)
  • Server-rendered HTML — no JavaScript execution required
  • Polite crawl with modest concurrency (5 concurrent requests)
  • Robots.txt is respected — a small number of sensitive articles listed in robots.txt Disallow are not included in the sitemap